Plotting Map Data

Haley Jeppson, Joe Papio,
Sam Tyner

June 13, 2017

Loading Required Packages

library(dplyr)
library(albersusa)
library(sf)

States Data

To make a map, let’s load up the states data and take a look:

library(ggplot2)
library(albersusa)
library(sf)
states <- usa_sf("laea")
glimpse(states)
## Observations: 51
## Variables: 14
## $ geo_id              <fctr> 0400000US04, 0400000US05, 0400000US06, 04...
## $ fips_state          <fctr> 04, 05, 06, 08, 09, 11, 13, 17, 18, 22, 2...
## $ name                <fctr> Arizona, Arkansas, California, Colorado, ...
## $ lsad                <fctr> , , , , , , , , , , , , , , , , , , , , ,...
## $ census_area         <dbl> 113594.084, 52035.477, 155779.220, 103641....
## $ iso_3166_2          <fctr> AZ, AR, CA, CO, CT, DC, GA, IL, IN, LA, M...
## $ census              <int> 6392017, 2915918, 37253956, 5029196, 35740...
## $ pop_estimataes_base <int> 6392310, 2915958, 37254503, 5029324, 35740...
## $ pop_2010            <int> 6411999, 2922297, 37336011, 5048575, 35793...
## $ pop_2011            <int> 6472867, 2938430, 37701901, 5119661, 35905...
## $ pop_2012            <int> 6556236, 2949300, 38062780, 5191709, 35943...
## $ pop_2013            <int> 6634997, 2958765, 38431393, 5272086, 35993...
## $ pop_2014            <int> 6731484, 2966369, 38802500, 5355866, 35966...
## $ geometry            <simple_feature> MULTIPOLYGON(((-1111065.933...,...

Basic Map Data

What needs to be in the data set in order to plot a basic map?

  • Need latitude/longitude points for all map boundaries
  • Need to know which boundary group all lat/long points belong
  • Need to know the order to connect points within each group

Simple Features

  • A feature is a real-world object - can consist of other objects.
    • e.g. a car is an object that consists of tires, engine, etc.
  • Features have geometry - where the object is located and description of other properties
  • Map data are MULTIPOLYGON simple feature objects
  • Borders of states, countries, etc. form POLYGON objects
    • e.g. each US state has a geometry that describes where in the world it is, and various other descriptors like population, land size, etc.

See vignette("sf1", package = "sf") for more

Plotting Simple Features

Very easy with geom_sf()

ggplot(data = states) + geom_sf()

Adding State Attributes

We want to incorporate additional information into the plot:

  • Add other geographic information by adding geometric layers to the plot
  • Add non-geopgraphic information by altering the fill color for each state
    • Use geom = "polygon" to treat states as solid shapes to add color
    • Incorporate numeric information using color shade or intensity
    • Incorporate categorical informaion using color hue

Categorical Information Using Hue

If a categorical variable is assigned as the fill color then ggplot2 will assign different hues for each category. Let’s load in a state regions dataset:

statereg <- read.csv("http://heike.github.io/rwrks/02-r-graphics/data/statereg.csv", stringsAsFactors = FALSE)

glimpse(statereg)
## Observations: 51
## Variables: 2
## $ State       <chr> "california", "nevada", "oregon", "washington", "i...
## $ StateGroups <chr> "West", "West", "West", "West", "West", "West", "W...

Joining Data

We need to join or merge our original states data with this new state info. We can use the left_join function to do so (more on this later):

states$name <- tolower(states$name)
states.class.map <- left_join(states, statereg, by = c("name" = "State"))
glimpse(states.class.map)
## Observations: 51
## Variables: 15
## $ geo_id              <fctr> 0400000US04, 0400000US05, 0400000US06, 04...
## $ fips_state          <fctr> 04, 05, 06, 08, 09, 11, 13, 17, 18, 22, 2...
## $ name                <chr> "arizona", "arkansas", "california", "colo...
## $ lsad                <fctr> , , , , , , , , , , , , , , , , , , , , ,...
## $ census_area         <dbl> 113594.084, 52035.477, 155779.220, 103641....
## $ iso_3166_2          <fctr> AZ, AR, CA, CO, CT, DC, GA, IL, IN, LA, M...
## $ census              <int> 6392017, 2915918, 37253956, 5029196, 35740...
## $ pop_estimataes_base <int> 6392310, 2915958, 37254503, 5029324, 35740...
## $ pop_2010            <int> 6411999, 2922297, 37336011, 5048575, 35793...
## $ pop_2011            <int> 6472867, 2938430, 37701901, 5119661, 35905...
## $ pop_2012            <int> 6556236, 2949300, 38062780, 5191709, 35943...
## $ pop_2013            <int> 6634997, 2958765, 38431393, 5272086, 35993...
## $ pop_2014            <int> 6731484, 2966369, 38802500, 5355866, 35966...
## $ StateGroups         <chr> "Southwest", "South", "West", "West", "Nor...
## $ geometry            <simple_feature> MULTIPOLYGON(((-1111065.933...,...

Plotting the Result

ggplot(data = states.class.map) + geom_sf(aes(fill = StateGroups))

Numerical Information Using Shade and Intensity

To show how was can add numerical information to map plots we will use the BRFSS data

  • Behavioral Risk Factor Surveillance System
  • 2008 telephone survey run by the Center for Disease Control (CDC)
  • Ask a variety of questions related to health and wellness
  • Cleaned data with state aggregated values posted on website

BRFSS Data Aggregated by State

states.stats <- read.csv("http://heike.github.io/rwrks/02-r-graphics/data/states.stats.csv", stringsAsFactors = FALSE)
glimpse(states.stats)
## Observations: 54
## Variables: 6
## $ state.name  <chr> "alabama", "alaska", "arizona", "arkansas", "calif...
## $ avg.wt      <dbl> 180.7247, 189.2756, 169.6867, 177.3663, 170.0464, ...
## $ avg.qlrest2 <dbl> 9.051282, 8.380952, 5.770492, 8.226619, 6.847751, ...
## $ avg.ht      <dbl> 168.0310, 172.0992, 168.2616, 168.7958, 168.1314, ...
## $ avg.bmi     <dbl> 29.00222, 28.90572, 27.04900, 28.02310, 27.23330, ...
## $ avg.drnk    <dbl> 2.333333, 2.323529, 2.406897, 2.312500, 2.170000, ...

We must join this data again

states.map <- left_join(states, states.stats, by = c("name" = "state.name"))
glimpse(states.map)
## Observations: 51
## Variables: 19
## $ geo_id              <fctr> 0400000US04, 0400000US05, 0400000US06, 04...
## $ fips_state          <fctr> 04, 05, 06, 08, 09, 11, 13, 17, 18, 22, 2...
## $ name                <chr> "arizona", "arkansas", "california", "colo...
## $ lsad                <fctr> , , , , , , , , , , , , , , , , , , , , ,...
## $ census_area         <dbl> 113594.084, 52035.477, 155779.220, 103641....
## $ iso_3166_2          <fctr> AZ, AR, CA, CO, CT, DC, GA, IL, IN, LA, M...
## $ census              <int> 6392017, 2915918, 37253956, 5029196, 35740...
## $ pop_estimataes_base <int> 6392310, 2915958, 37254503, 5029324, 35740...
## $ pop_2010            <int> 6411999, 2922297, 37336011, 5048575, 35793...
## $ pop_2011            <int> 6472867, 2938430, 37701901, 5119661, 35905...
## $ pop_2012            <int> 6556236, 2949300, 38062780, 5191709, 35943...
## $ pop_2013            <int> 6634997, 2958765, 38431393, 5272086, 35993...
## $ pop_2014            <int> 6731484, 2966369, 38802500, 5355866, 35966...
## $ avg.wt              <dbl> 169.6867, 177.3663, 170.0464, 167.1702, 16...
## $ avg.qlrest2         <dbl> 5.770492, 8.226619, 6.847751, 8.134715, 8....
## $ avg.ht              <dbl> 168.2616, 168.7958, 168.1314, 169.6110, 16...
## $ avg.bmi             <dbl> 27.04900, 28.02310, 27.23330, 26.16552, 26...
## $ avg.drnk            <dbl> 2.406897, 2.312500, 2.170000, 1.970501, 1....
## $ geometry            <simple_feature> MULTIPOLYGON(((-1111065.933...,...

Shade and Intensity

Average number of days in the last 30 days of insufficient sleep by state

ggplot(data = states.map) + geom_sf(aes(fill = avg.qlrest2))

BRFSS Data Aggregated by State

states.sex.stats <- read.csv("http://heike.github.io/rwrks/02-r-graphics/data/states.sex.stats.csv", stringsAsFactors = FALSE)
glimpse(states.sex.stats)
## Observations: 108
## Variables: 8
## $ state.name  <chr> "alabama", "alabama", "alaska", "alaska", "arizona...
## $ SEX         <int> 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1,...
## $ avg.wt      <dbl> 198.8936, 173.0315, 203.3919, 169.5660, 191.3739, ...
## $ avg.qlrest2 <dbl> 8.648936, 9.224771, 7.236111, 9.907407, 5.163793, ...
## $ avg.ht      <dbl> 177.5729, 163.9956, 178.3896, 163.1296, 177.1724, ...
## $ avg.bmi     <dbl> 28.50714, 29.21280, 28.91494, 28.89286, 27.63152, ...
## $ avg.drnk    <dbl> 3.033333, 2.041667, 2.487179, 2.103448, 2.814286, ...
## $ sex         <chr> "Male", "Female", "Male", "Female", "Male", "Femal...

One More Join

states.sex.map <- left_join(states, states.sex.stats, by = c("name" = "state.name"))
glimpse(states.sex.map)
## Observations: 102
## Variables: 21
## $ geo_id              <fctr> 0400000US04, 0400000US04, 0400000US05, 04...
## $ fips_state          <fctr> 04, 04, 05, 05, 06, 06, 08, 08, 09, 09, 1...
## $ name                <chr> "arizona", "arizona", "arkansas", "arkansa...
## $ lsad                <fctr> , , , , , , , , , , , , , , , , , , , , ,...
## $ census_area         <dbl> 113594.084, 113594.084, 52035.477, 52035.4...
## $ iso_3166_2          <fctr> AZ, AZ, AR, AR, CA, CA, CO, CO, CT, CT, D...
## $ census              <int> 6392017, 6392017, 2915918, 2915918, 372539...
## $ pop_estimataes_base <int> 6392310, 6392310, 2915958, 2915958, 372545...
## $ pop_2010            <int> 6411999, 6411999, 2922297, 2922297, 373360...
## $ pop_2011            <int> 6472867, 6472867, 2938430, 2938430, 377019...
## $ pop_2012            <int> 6556236, 6556236, 2949300, 2949300, 380627...
## $ pop_2013            <int> 6634997, 6634997, 2958765, 2958765, 384313...
## $ pop_2014            <int> 6731484, 6731484, 2966369, 2966369, 388025...
## $ SEX                 <int> 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, ...
## $ avg.wt              <dbl> 191.3739, 156.2054, 202.2959, 163.4057, 18...
## $ avg.qlrest2         <dbl> 5.163793, 6.142857, 6.542553, 9.086957, 6....
## $ avg.ht              <dbl> 177.1724, 162.7043, 178.4388, 163.7151, 17...
## $ avg.bmi             <dbl> 27.63152, 26.67683, 28.78676, 27.59545, 27...
## $ avg.drnk            <dbl> 2.814286, 2.026667, 2.789474, 2.000000, 2....
## $ sex                 <chr> "Male", "Female", "Male", "Female", "Male"...
## $ geometry            <simple_feature> MULTIPOLYGON(((-1111065.933...,...

Adding Information

Average number of alcoholic drinks per day by state and gender

ggplot(data = states.sex.map) + geom_sf(aes(fill = avg.drnk)) +
  facet_grid(.~sex)

Your Turn

  • Use left_join to combine child healthcare data with maps information. You can load in the child healthcare data with:
states.health.stats <- read.csv("http://heike.github.io/rwrks/02-r-graphics/data/states.health.stats.csv", stringsAsFactors = FALSE)
  • Use geom_sf to create a map of child healthcare undercoverage rate by state

Cleaning Up Your Maps

Use ggplot2 options to clean up your map!

  • Adding Titles + ggtitle(...)
  • Might want a plain white background + theme_bw()
  • Extremely familiar geography may eliminate need for latitude and longitude axes + theme(...)
  • Want to customize color gradient + scale_fill_gradient2(...)
  • Keep aspect ratios correct + coord_map()

Cleaned Up Map

ggplot(data = states.map) + geom_sf(aes(fill = avg.drnk)) +
  theme_bw() +
  scale_fill_gradient2(limits = c(1.5, 3),low = "lightgray", high = "red") + 
  theme(axis.ticks = element_blank(),
       axis.text.x = element_blank(),
       axis.title.x = element_blank(),
       axis.text.y = element_blank(),
       axis.title.y = element_blank()) +
  ggtitle("Map of Average Number of Alcoholic Beverages\nConsumed Per Day by State")

Your Turn

Use options to polish the look of your map of child healthcare undercoverage rate by state!